repare a data set ܠൌሺݔଵ, ݔଶ, ⋯, ݔேሻ.
Design bins ܤൌሺܤଵ, ܤଶ, ⋯, ܤெሻ such as the eight bins (M = 8)
hown in the top-right panel of Figure 2.2.
Throw each data point ݔ into a bin ܤ if ݔ is within the
oundaries of ܤ.
Count the data points (ݔ) falling in each bin ܤ, such as 1,277
nsertions in the first bin in Figure 2.2.
Denote the data point count of each bin as ݂ (frequency) as
hown in the bottom-left panel of Figure 2.2.
Convert each frequency (݂) to a probability () as shown in the
ottom-right panel of Figure 2.2.
Use either ݂ or to interpret the estimated density.
ormula for converting a frequency histogram to a probabilistic
m is defined as below,
ൌ݂
݂
ୀଵ
(2.1)
histogram statistics have different meanings. ݂ stands for the
nt of the mth bin while stands for the probability for data to
he mth bin. The former can be used for interpreting or explaining
are distributed. The latter can be used for making an inference
data. In other words, the probability of a bin can indicate how
novel data point occurs in the bin. For instance, the probability
vel transposon insertion statistic falls into the first bin and the
bin were 8% and 19%, respectively for the gene isftu1 shown in
m-right panel of Figure 2.2.
R function for the histogram algorithm is hist. Given a vector x,
x of the hist function is shown below, where nclass is the
r for designing the bin number,
hist(x,nclass,···)